ON THE p LOGARITHMIC AND a-POWER DIVERGENCE 
MEASURES IN INFORMATION THEORY 
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Abstract. In this paper we introduce the concepts of p— logarithmic and 
a— power divergence measures and point out a number of basic results. 



1. Introduction 



One of the important issues in many applications of Probability Theory is finding 
an appropriate measure of distance (or difference or discrimination ) between two 
probability distributions. A number of divergence measures for this purpose have 
been proposed and extensively studied by Jeffreys 0, Kullback and Leibler 0, 
Renyi JH, Havrda and Charvat [Jj, Kapur [Sj, Sharma and Mittal 0, Burbea and 
Rao |U, Rao 0, Lin [101, Csiszar [TT], Ali and Silvey Q]]], Vajda [H], Shioya and 
Da-te |3H] and others (see for example |H] and the references therein). 

These measures have been applied in a variety of fields such as: anthropology 0, 
genetics ^H]) finance, economics, and political science ^B], 53 > [HJ- biology [B?j . 
the analysis of contingency tables [20] , approximation of probability distributions 
|2T]. signal processing P3] an d pattern recognition [53], 

Assume that a set x an d the a— finite measure fi are given. Consider the set of all 
probability densities on /i to be Q := < p\p : \ ~^ p (x) > 0, f p (x) dfi (x) = 1 >, 



The Kullback-Leibler divergence |3] is well known among the information diver- 
gences. It is defined as: 



where log is to base 2. 

In Information Theory and Statistics, various divergences are applied in addi- 
tion to the Kullback-Leibler divergence. These are the: variation distance D v , 
Hellinger distance Dh |41 j . x 2 — divergence D x 2, a— divergence D a , Bhattacharyya 
distance Db |42j . Harmonic distance Dua, Jeffrey's distance Dj 0, triangular 
discrimination D/\ |36|. etc... They are defined as follows: 



(1.1) 




P,gen, 



(1.2) 



D v (p,q):= / \p(x)-q(x)\dfj,(x), p,qeQ; 



x 



(1.3) 



Dh (p, g) := / VpT^) - V° ( x ) dfJ- i x ) > P,Q^^ 



x 
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(1.4) 
(1.5) 
(1.6) 
(1.7) 
(1.8) 
(1.9) 



D x 2 (P'<l) : = / p( x ) 



q(x) 



dfi (x) , p, q € fi; 



D a (p, q) := 



p(x) 

1- I [p{x)}-^ [q(x)}-^ dfl{x) 
Db [p, q) ■= / \/p(x)q(x)dfi (x) , p,qett; 



1-a 2 



p,q<Ett; 



DHa (p, q) ■= 

J X 



2p (x) q (x) 



Dj(p,q) 



x p{x) +q(x) 

p(x) 



dn (x) , p,qett; 



\p{x)-q (x)} In 



q{x) 



d\x (x) , p,qett; 



Da (p, q) ■= 



[p{x) -q(x)Y 
p (x) + q (x) 



d[i (x) , p,q E Q. 



For other divergence measures, see the paper [H] by Kapur or the book on line 021 
by Taneja. For a comprehensive collection of preprints available on line, see the 
RGMIA web site http://rgmia.vu.edu.au/papersinfth.html 
Csiszar /—divergence is defined as follows 



(1.10) 



D f (p,q) 



P{x)f 



g 0) 

p(x) 



dfj, (x) , p,qefl, 



where / is convex on (0, oo). It is assumed that / (u) is zero and strictly convex 
at u — 1. By appropriately defining this convex function, various divergences are 
derived. All the above distances i|l.l[l — (II. 9|) . are particular instances of Csiszar 
/—divergence. There are also many others which are not in this class (see for 
example [H] or 03). For the basic properties of Csiszar /—divergence see |44|-|46|. 
In 02], Lin and Wong (see also jTHj) introduced the following divergence 



(1.11) D LW (p,q) := / p(a:)log 



p(x) 



dfJL (x) , p,q E n. 



\p 0) + \q (x)_ 

This can be represented as follows, using the Kullback-Leibler divergence: 



Dlw (p, q) = D KL [p, + ^q 
Lin and Wong have established the following inequalities 



(1.12) 



Dlw (p, q) < \ d kl (p, q) ; 



(1.13) 



Dlw (p, q) + Dlw {q,p) < D v (p, q) < 2; 



(1.14) 



Dlw (p,q) < 1- 



p— LOGARITHMIC AND a-POWER DIVERGENCE MEASURES 



In Shioya and Da-te improved (|1.12J) — Ijl.l4|l by showing that 
Dlw (p, q) < 1}D V (p, q) < 1. 

For classical and new results in comparing different kinds of divergence measures, 
see the papers HJ-0H1 where further references are given. 

2. Some New Divergence Measures 
We define the p— Logarithmic means by (see p. 346]) 

" bP+ 1 -a p+1 



(2.1) L p (a,b) = < 



(p+l)(6-a) 
b— a 



In b— In a 7 



p = —1, o^i), a, b > 
p = 0, 



(o, a) = a. 

Where convenient, L_i (a, 6) — </ie logarithmic mean, will be written as just L (a, 6). 
The case p = is also called the identric mean, i.e., Lq (a, 6) and will be denoted by 
/ (a, b). Of course, we will also define L^ (a, 6) = max {a, 6} and L-oo = min {a, 6} 
to complete the scale. 

It is easily checked that the definitions in the above scale are consistent in the 
sense that lim L p (a, b) — I (a, b) and lim L p (a, b) = L± OQ (a, b). 

We define the p— logarithmic divergence measure, or simply the L p — divergence 
measure, by 



(2-2) D Lp (q, 



(p+l)(g(x)-r(x)) 
g(:c)— r(ic) 



In q(:c)— In r(x) 

[«(*)] 1(00 



d/x (x) , if p ^ -1,0, 
djj,(x) , if p = — 1, g, rGfi 



[r(a:)] 



7TT 



<i/i (x) , if p = 0, 



(2.3) 



D+oo (q,r) 



(2.4) D_oo(9,r) 

We observe that 

(2.5) D +oc (q,r) = f 

and similarly, 
(2.6) 



max {q (x) ,r (x)} dfj, (x) , q,r S f2, 

= / min{g (x) , r (x)} d/j,(x) , q,r£Q. 
'x 



l(x)+r(x) + k(x)-r(x)\ ^ ^ = 1 + 1 ^ fo r) 



£>-oo (?, r) = 1 - -D„ (17, r) 



Since L p (a, 6) = L p (b, a) for all a, b > and p G [—00, 00], we can conclude that 
the L p — divergence measures are symmetrical. 
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Now, if we consider the continuous mappings (which are not necessarily convex) 



(2-7) f P (x) 



(p+l)(x-l) 

x-1 
In x-1 ' 

±X* 



-1 



/+oo (a:) 
/-oo (a:) 



I 1 



l + ^\x-l\ 

1 — — las — II 
2 1 1 



xe (0,l)U(l,oo), p^-1,0; 
x £ (0, 1) U (l,oo), p = -1; 
x e (0, 1) U (l,oo) , p = 0; 
if x = 1, 



and taking into account that L p (a, 6) = aL p (l, ^) for all a, b > and p G [— oo, oo], 
we deduce that 

(2.8) D fp (q,r) = J 9^)fp(j^j)^(x) 

q(x)L p ^-Q,l)d»(x) 

L p (r (x) , q (x)) dp, (x) = D Lp (q, r) 

for all q, r € f2, which shows that the L p — divergence measure can be interpreted as 
f—Csiszdr divergences for / = /„, which are not necessarily convex. 

The following result is well known in the theory of p— logarithmic means (see for 
example P p. 347]). 



Lemma 1. We have 



L_ 2 (a,b) = G(a,b). 



L_i(a,b) = -(A(a,b)+G(a,b))., 



(2.9) 
(2.10) 

(2.11) L x (a, b) = A (a, 6), 

(2.12) £-3 (a, ft) = [ff(a,6)G 2 (a,6)]^, 
and £/ie monotonicity property 

(2.13) L-oo (a, 6) < L r (a, b) < L s (a, 6) < L +OQ (a, b) 

with equality iff a = b, where — oo < r < 6 < oo, and A (a, b) is the arithmetic 
mean, G (a, b) is the geometric mean and H (a,b) is the harmonic mean ofa,b. 
In particular, we have 

1 



G (a, b)<L (a, b) < - [A (a, b) + G (a, b)} <I(a,b)<A (a, b) 



(2.14) 
with equality iff a = b. 

Now, using H2.9[> - (|2.12l) , we observe that 



(2.15) 



y/r (x)q(x)dp (x) 
Db (p, q) (Bhattacharyya distance) 
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(2.16) D L 1 (q,r) 

~2 



1 

1 1 



q(x)+r(x) r~r~\ — 



<i/i (x) 



(p, g) 



(2.17) D L _,(g,r) 



2 2 

2r (x) g (a;) 



r (x) q (x) 



d[i (x) 



r (x) + q (x) 
r§ (x)g§ (x) 

— — -dfi x) =: D i ( P ,q) 

x[r(x) + q(x)]* lHG]3 



for all q, r £ Q,. 

Using Lemma n we can state the following fundamental theorem regarding the 
position of the L p — divergence measures. 



Theorem 1. For any q,r £ Q, we have the inequality 
(2.18) 



1 - 1 D v (r, g) < C L „ (r, g) < D Ls (r, g) < 1 + ^D v (r, q) 



for all — oo < u < s < oo. 
In particular, we have 

(2-19) l-~A,(p,g) < £> [HG2] i (p,q) <D B (p,q)<D L (p,q) 

< ^ + -D B {p,q)<D I (p,q)<l, 



where 



Dl (r,q) 



r (x) — q (x) 
lnr (x) — lng (x) 



(i/i (x) is f/ie Logarithmic divergence 



Di(r,q) = - f 



[<?(*)] 



<f(x) 



r(x)-,(x) 



cfyi (x) is i/ie Identric divergence. 



Remark 1. From }2.18\) . we can conclude the following inequality for the L p — divergence 
measure in terms of the variation distance 

1 



\D Ls (r,q)-l\ < -D v (r,q), r,qeV 



(2.20) 

for all s G [—00,00]. The constant | is sharp. 

Indeed, if we assume that l|2.20|l holds with another constant c > 0, i.e., 
(2.21) \D Ls (r,q)-l\<cD v (r,q), 
then, choosing s = 00, we obtain 



1 + -A, (r,q)-l 



< cD v (r, q) for all r, q € Q, 



which implies that c > k. 
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For r £ R, we define the a—th power mean of the positive numbers a, b by (see 
EH P. 133]) 



(2.22) 



M [a] (a, b) := < 



if a ^0, a^±oo; 
ab if a = 0: 



: {a, b} if a = 



-oo; 



min{a, 6} if a = — oo. 



We define the a— power divergence measure by 
(2.23) 



p a {x)+q a (x) 
1 



dfj, (x) if a 7^ 0, a ^ ±oo; 



J -y/p (x) g (x)d/ji (x) if a = 0; (Bhattacharyya distance) 



1 + lD v (p, q) 



if a = +oo; 
if a = — oo. 



[ i-±A,M 

Since M^l (a, 6) = M^ a ' (b,a) for all a, b > and a € [—00,00], we can conclude 
that the a— power divergences are symmetrical. Now, if we consider the continuous 
mappings (which are not necessarily convex) 



(2.24) 



fa (x) := < 



[x^ti] „ if a ^0,a^ ±00; 

y^E if a = 0, x e (0, 00) ; 

1 -(- -5 I cc — 1 1 if a = +00; 

k 1— kIjc— 1| if a = — 00 



and taking into account that M^l (a, 6) = aM'"' (l, ~), we deduce that 

(2.25) D fa (p,q) = J p{x)f a (^jdn(x) 

\p{x) J 
MM(j>(x),q(x))dn(x)=D MW (p,q) 

for all p, r G f2, which shows that the a— power divergence measures can be inter- 
preted as f— Csiszdr divergences for f = f a , which are not necessarily convex. 

The following result concerning the fundamental property of the a— power means 
holds (see p. 133 and p. 159]). 

Lemma 2. Let a, b > 0. Then 

(2.26) Ml" 00 ! (a, b) < Af ^ (a, 6) < Af ^ (a, b) < M [+oo] (a, 6) 
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for — oo < a < f3 < oo. 
Also, 

(2.27) lim M W (a, b) = G (a, b) , lim M [a] (a, b) = M [±oc] (a, b) . 
In particular 

(2.28) A/[- 1 l(a,6) =iJ(a,6). 

Using the above lemma, we can state the following theorem concerning the lo- 
cation of the a— power divergence measure. 

Theorem 2. For any p,q € Q, we have: 

(2.29) 1 - ~D V (p, q) < D M[a] (p, g) < £> M[ffl (p, g) < 1 + ~D„ (p, q) 

for — oo < a < (3 < oo. 
In particular, we have 

(2-30) 1 - ~D V (p, q) < D Ha (p, g) < £> B (p, g) < 1 + ±D V (p, g) , 

where Dn a (p, q) is fie Harmonic divergence and Db(p, g) is i/ie Bhattacharyya 
distance. 



Remark 2. From $2.29}) . we may conclude the following inequalities for the ex- 
power divergence measures in terms of the variational distance 

(2.31) |Z> MM (p,g)-l|<i^(p,g) 

/or any p,q S f2 and a € [— oo, oo] and i/ie constant | is sharp. 

In what follows, by the use of a result by Pittenger p. 349], we obtain 
inequalities that are related to logarithmic and power means: 

Theorem 3. Let a, b > 0, — oo < r < oo and define 

[ r + 2 In 2 1 

n = nun|— , r. I - TT j,r>-W0 

|, In 2 ^ . ,=n 

r + 2 



, , r < -1, 



3 

wii/i r2 as defined above, but with max instead of min, i/ien 

(2.32) Af [ri1 (a, 6) < L r (a, 6) < M [r2] (a, 6) , 

with equality iff a — b or r — 1, — i or —2. TTie values r\ and r 2 are sharp. 

Wc arc able to establish the following relationship between the power divergence 
and the generalized logarithmic divergence. 

Theorem 4. For any p,q € fl, we have: 

(2.33) D Mlri] (p, q) < D Lr (p, q) < D M ir 2] (p, g) , 
where T\,r<i are as defined above. 
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Remark 3. a) In the case r = — 1, the corresponding inequality in \2. S2\) is 
due to Lin (see 1, p. 349]J and for this r we obtain from 12. Sty) 

(2.34) B(p,q)<D L (p,q)<D m(p,«), p, q G 0. 

b) // — 2 < r < — | or r > 1, then r 2 — r rp and the right hand side of \2.Hty) 
becomes 

(2.35) D Lr (p,q)<D ^(p,?), p,?en. 

Aft 3 J 

Using the above means, we can imagine all the sets of other divergences that can 
be constructed by the use of different contributions of these means. For example, 
we can define for p,q G f2 

D {AG) i(P'l) : = \ /a (p( x ),1 0*0) G (p (x) , q (x))du (x) ; 



D {LI) i(P'l) '■ = I ^J L {p{x),q (x)) J (p (x) , q (x) 



or even 



Using Alzer's result for means (see p. 350]) 

Theorem 5. If a,b> 0, we have 

y/A(a,b)G{a,b) < \J L (a,b)I{a,b) < aM (a, b) ; 
L (a, b) + I (a, b) < A (a, b) + G (a, b) ; 

f 



y/G{a,b)I(a,b) < L(a,b)<-[G(a,b)+I(a,b)], 

and we may state the following theorem concerning the above divergence measures. 
Theorem 6. For any p,q G £1, we have 

(2-36) D (AG)i (p, q ) < D (Li)iM <D M[i] (p, q ), 

(2.37) D L (p,q)+D I (p,q) < 1 + B(p,q), 

(2-38) D (Gi)i (p,q) < D I (p,q)<±[B(p,q) + D I (p,q)}. 

Remark 4. In this way, we have shown that any result for special means 

can be imported for the divergence measure generated by these means, providing a 
very rich universe of facts in comparing the new distances with the other distances 
which have already become classics in Information Theory: Bhattacharyya distance 
B (p, q), Harmonic distance Ha (p, q) or variation distance D v (p, q), etc. 
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